tl;dr - I think our data is just SUPER spatially autocorrelated. I’m not sure if a 1D spin test is able to produce nulls that are different enough from our data!!

My thought process: To create spatial autocorrelation preserving nulls, we can do a 1D spin test using data along a tract. Here, I want to take the correlation between distance_to_cortex and age_effect. I can “spin” one of these maps, take the correlation between the spun map and the 2nd map, and create a null distribution.

To do a 1D spin, imagine taking my 1D data (100 nodes along a tract), concatenating it with a backwards version of itself, and ‘gluing’ the two ends to create a circle. The spin test would be like turning a dial, with each turn preserving the spatial relationships between adjacent nodes while ‘shifting’ the actual data.

In R, I operationalize this ‘1D spin’ (or ‘shift’) by doing this concatenation: extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])

I then use a window the size of my tract profile and shift along this extended_profile. Thus for each spin (or shift), a different node becomes my starting node. If my tract profile has length 100, then my shifted data might look like c(node_k, node_k+1, node_k+2… node_100-k). The length of the data would still be 100, but now the values are just shifted.

However, I want to spin/shift my data 10,000 times while only having 100 nodes (max number of unique spins with 100 nodes = 198 spins). So in DIPY, I sampled 10,000 points along my tract and calculated the distance to cortex at each of those points. When I do my spin test in R, I can start at each point in my data of length 10,000 and sample 100 equidistant points along the entire tract (max number of unique spins = 19,998). I take 10,000 of those permuted maps and correlate each with my age_effect map to create my null distribution.

Another thing to consider is that spins that are too similar to our starting data will not work! I tried a few ways to address this (spoiler: none of these really work well).

  1. Take spun data that is not correlated with the original data (i.e. setting a threshold for absolute correlation: < 0.3 or < 0.1)

  2. take spun data that has shifted the data at least X number of nodes away from node 0. However, a lot of the nulls we include here are still significantly correlated with the original data. But if we only include nulls that are super far away from 0, we get these interesting negatively skewed distributions. I THINK it’s because many of my nulls end up being something like node 50 to node 49 (which is anticorrelated with my original data) – everything is super significant!! LOL But I think p-values are probably artificially deflated…

setup

config_file="/cbica/projects/luo_wm_dev/code/tract_profiles/config/config_HCPD.json"
 
# Read config from the specified file
config <- fromJSON(file=config_file)

dataset = config['dataset']
data_root = config['data_root']
outputs_root = config['outputs_root']
outputs_dir = paste0(outputs_root, "/all_subjects")
spin_dir = paste0(outputs_root, "/1d_spin_corrs")

 
    
if (!dir.exists(spin_dir)) {
  # If directory doesn't exist, create it
  dir.create(spin_dir, recursive = TRUE)
  print(paste("Directory", spin_dir, "created."))
} else {
  print(paste("Directory", spin_dir, "already exists."))
}
## [1] "Directory /cbica/projects/luo_wm_dev/output/HCPD/tract_profiles/1d_spin_corrs already exists."

functions

fix_tract_orientations <- function(df) {
  # flip tracts
  to_reorient <- df %>%
    group_by(tractID) %>%
    mutate(nodeID = ifelse(tractID %in% c("Cingulum Cingulate", "Inferior Fronto-occipital Fasciculus", "Inferior Longitudinal Fasciculus", "Uncinate Fasciculus", "Corticospinal Tract"), max(nodeID) - nodeID, nodeID))
  
  # label main orientation
  reoriented <- to_reorient %>% 
    mutate(main_orientation = case_when(
      tractID %in% c("Anterior Thalamic Radiation", "Cingulum Cingulate", "Inferior Fronto-occipital Fasciculus",
                     "Inferior Longitudinal Fasciculus", "Superior Longitudinal Fasciculus") ~ "AP",
      tractID %in% c("Arcuate Fasciculus", "Uncinate Fasciculus") ~ "AP_frontal_temporal",
      tractID %in% c("Forceps Minor", "Forceps Major") ~ "RL",
      tractID %in% c("Corticospinal Tract", "Posterior Arcuate","Vertical Occipital Fasciculus") ~ "SI",
      TRUE ~ NA_character_
    ))
  
  reoriented$main_orientation <- as.factor(reoriented$main_orientation)
  
  return(reoriented)
}

arrange_by_orientation <- function(list_figures, title, figure_index = NULL) {
  
  if(is.null(figure_index)) {
    AP_plots <- ggarrange(list_figures$`Anterior Thalamic Radiation`, 
                           list_figures$`Cingulum Cingulate`, 
                           list_figures$`Inferior Fronto-occipital Fasciculus`, 
                           list_figures$`Inferior Longitudinal Fasciculus`, 
                           list_figures$`Superior Longitudinal Fasciculus`, ncol=5, nrow = 1)
  AP_plots_final <- annotate_figure(AP_plots, top = text_grob("Anterior - Posterior", 
                 color = "black", face = "bold", size = 14))
  
  
  AP_frontotemp_plots <- ggarrange(list_figures$`Arcuate Fasciculus`,
                                      list_figures$`Uncinate Fasciculus`, ncol=2, nrow = 1)
  AP_frontotemp_plots_final <- annotate_figure(AP_frontotemp_plots, top = text_grob("Anterior - Posterior (Frontal - Temporal)", 
                 color = "black", face = "bold", size = 14))
  
   
  RL_plots <- ggarrange(list_figures$`Forceps Major`, list_figures$`Forceps Minor`, 
                           ncol=2, nrow = 1)
  RL_plots_final <- annotate_figure(RL_plots, top = text_grob("Right - Left", 
                 color = "black", face = "bold", size = 14))
  
  
  SI_plots <- ggarrange(list_figures$`Corticospinal Tract`, list_figures$`Posterior Arcuate`, 
                           list_figures$`Vertical Occipital Fasciculus`, common.legend=TRUE, legend=c("right"),
                           ncol=3, nrow = 1)
  SI_plots_final <- annotate_figure(SI_plots, top = text_grob("Superior - Inferior", 
                 color = "black", face = "bold", size = 14))
  } else {
    AP_plots <- ggarrange(list_figures$`Anterior Thalamic Radiation`[[figure_index]], 
                           list_figures$`Cingulum Cingulate`[[figure_index]], 
                           list_figures$`Inferior Fronto-occipital Fasciculus`[[figure_index]], 
                           list_figures$`Inferior Longitudinal Fasciculus`[[figure_index]], 
                           list_figures$`Superior Longitudinal Fasciculus`[[figure_index]], ncol=5, nrow = 1)
  AP_plots_final <- annotate_figure(AP_plots, top = text_grob("Anterior - Posterior", 
                 color = "black", face = "bold", size = 14))
  
  
  AP_frontotemp_plots <- ggarrange(list_figures$`Arcuate Fasciculus`[[figure_index]],
                                      list_figures$`Uncinate Fasciculus`[[figure_index]], ncol=2, nrow = 1)
  AP_frontotemp_plots_final <- annotate_figure(AP_frontotemp_plots, top = text_grob("Anterior - Posterior (Frontal - Temporal)", 
                 color = "black", face = "bold", size = 14))
  
   
  RL_plots <- ggarrange(list_figures$`Forceps Major`[[figure_index]], list_figures$`Forceps Minor`[[figure_index]], 
                           ncol=2, nrow = 1)
  RL_plots_final <- annotate_figure(RL_plots, top = text_grob("Right - Left", 
                 color = "black", face = "bold", size = 14))
  
  
  SI_plots <- ggarrange(list_figures$`Corticospinal Tract`[[figure_index]], list_figures$`Posterior Arcuate`[[figure_index]], 
                           list_figures$`Vertical Occipital Fasciculus`[[figure_index]], common.legend=TRUE, legend=c("right"),
                           ncol=3, nrow = 1)
  SI_plots_final <- annotate_figure(SI_plots, top = text_grob("Superior - Inferior", 
                 color = "black", face = "bold", size = 14))
  }
  
  
  tractprofiles_plot <- ggarrange(AP_plots_final, ggarrange(AP_frontotemp_plots_final, RL_plots_final, ncol = 2), SI_plots_final, nrow = 3)
  
  tractprofiles_plot_final <- annotate_figure(tractprofiles_plot, top = text_grob(title, 
               color = "black", face = "italic", size = 18))
   return(tractprofiles_plot_final)
}


load_bilat_distances <- function(dataset, type) {
  if(type == "10k") {
    bilat_gm_dist <- read.csv(sprintf("/cbica/projects/luo_wm_dev/output/%1$s/tract_profiles/all_subjects/across_subjects_bilateral_distances_to_gm_10k.csv",dataset))
  } else {
    bilat_gm_dist <- read.csv(sprintf("/cbica/projects/luo_wm_dev/output/%1$s/tract_profiles/all_subjects/across_subjects_bilateral_distances_to_gm.csv",dataset))
  }
   
  bilat_gm_dist <- bilat_gm_dist %>% mutate(
      
      tractID = case_when(
        bundle_name == "ATR" ~ "Anterior Thalamic Radiation",
        bundle_name == "CGC" ~ "Cingulum Cingulate",
        bundle_name == "CST" ~ "Corticospinal Tract",
        bundle_name == "IFO" ~ "Inferior Fronto-occipital Fasciculus",
        bundle_name == "ILF" ~ "Inferior Longitudinal Fasciculus",
        bundle_name == "SLF" ~ "Superior Longitudinal Fasciculus",
        bundle_name == "ARC" ~ "Arcuate Fasciculus",
        bundle_name == "UNC" ~ "Uncinate Fasciculus",
        bundle_name == "FA" ~ "Forceps Minor",
        bundle_name == "FP" ~ "Forceps Major",
        bundle_name == "pARC" ~ "Posterior Arcuate",
        bundle_name == "VOF" ~ "Vertical Occipital Fasciculus",
        TRUE ~ bundle_name
      )
    )
  bilat_gm_dist <- bilat_gm_dist %>% relocate(tractID)
  bilat_gm_dist <- bilat_gm_dist[,-2]
  
  if(type == "10k") {
    bilat_gm_dist_long <- bilat_gm_dist %>% pivot_longer(names_to="dist_to_gm", cols = names(bilat_gm_dist)[c(2:10001)])
  } else {
    bilat_gm_dist_long <- bilat_gm_dist %>% pivot_longer(names_to="dist_to_gm", cols = names(bilat_gm_dist)[c(2:101)])
  }
  bilat_gm_dist_long <- bilat_gm_dist_long %>% mutate(tract_node_noHemi = paste0(tractID, "_", gsub("_dist", "", dist_to_gm)))
  bilat_gm_dist_long$tract_node_noHemi <- gsub("_node", "", bilat_gm_dist_long$tract_node_noHemi)
   
  bilat_gm_dist_long$nodeID <- gsub("_dist", "", bilat_gm_dist_long$dist_to_gm)
  bilat_gm_dist_long$nodeID <- gsub("node_", "", bilat_gm_dist_long$nodeID)
  bilat_gm_dist_long$nodeID <- as.numeric(bilat_gm_dist_long$nodeID)
  
  bilat_gm_dist_long <- fix_tract_orientations(bilat_gm_dist_long)
  bilat_gm_dist_long <- bilat_gm_dist_long %>% mutate(tract_node_noHemi = paste0(tractID, "_",nodeID))
  
  
  #bilat_gm_dist_toplot <- merge(bilat_gm_dist_long, gam_age_md_bilat, by = "tract_node_noHemi")
  #bilat_gm_dist_toplot <- bilat_gm_dist_toplot %>% select(-tractID.y, -nodeID.y) %>% rename(tractID = tractID.x,
                                                                                           # nodeID = nodeID.x)
  
  bilat_gm_dist_toplot <- bilat_gm_dist_long %>% select(-dist_to_gm) %>% rename(dist_to_gm = value)  
  bilat_gm_dist_toplot <- bilat_gm_dist_toplot %>% arrange(tractID, nodeID)
  return(bilat_gm_dist_toplot)
}


# Function for generating a permutation map from tract profile data
# note that input tract profile data can be densely sampled (i.e. 10,000 nodes) to increase number of permutations. 
# synonymous to https://github.com/frantisekvasa/rotate_parcellation/blob/master/R/rotate.parcellation.R, but 1D!

# @param tract_profile A vector of integers representing data along a tract (such as tract profile or distances to cortex)
# @param num_samples An integer for number of samples to draw from tract_profile data (default is 100)
# @param nrot An integer for number of 1D rotations (or shifts; default is 10000)
# @param seed An integer for setting a seed. Helpful for reproducing the rotations. 
# output: a dataframe of permutations with nrow = nrot and ncol = num_samples 
rotate_tractprofiles <- function(tract, num_samples = 100, nrot = 10000, seed = NULL) {
  tract_profile <- bilat_gm_dist_toplot %>% filter(tractID==tract)
  tract_profile <- tract_profile$dist_to_gm
  original_length <- length(tract_profile)
  
  # create an extended profile (functionally similar to creating a 1D circle of our data)
  extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])
  extended_n <- length(extended_profile)
 
  perm.id <- matrix(NA, nrow = extended_n - original_length + 1, ncol = num_samples)
  
  # starting at each point in my length 10k data, sample 100 equidistant points along the entire tract. 
  # this is how i'm generating the permuted or spun data. By shifting the window along my 'extended_profile' (functionally equivalent to 'spinning the dial' on a wheel),
  # i can sample 100 equidistant points along the tract that preserves spatial autocorrelation. But the data is now shifted!
  #print(paste("Rotating tract profiles data for", tract))
  for (shift in 1:(extended_n - original_length + 1)) {
    sample_indices <- seq(shift, by = original_length / num_samples, length.out = num_samples) %% (extended_n - 1) + 1
    shifted_data <- extended_profile[sample_indices]
    perm.id[shift, ] <- shifted_data
  }
  
  perm.id_df <- data.frame(perm.id)
  perm.id_df <- perm.id_df[!duplicated(perm.id_df),] 

  # set seed for reproducibility
  if (!is.null(seed)) {
    set.seed(seed)
  }
  
  # randomly shuffle the order of the rows and then extract nrot number of permutations
  # each row = 1 rotation
  # each column = node position
  perm.id_df_shuffled <- perm.id_df[sample(nrow(perm.id_df)), ]
  perm.id_final <- perm.id_df_shuffled[1:nrot,]
 
 
  return(perm.id_final)
}

# i have 10,000 rows. i want to correlate each row with my vector (age_effect) and save out all the correlation
# fyi: only map.x has been spun (it's hard to spin both maps since it would require dense sampling for both maps to get to 10k permutations)
# output: p_perm and graphs
perm.circle.p <- function(tract, perm_map.x, map.x, map.y, corr.type="pearson") {
  
  #empirical correlation
  rho.emp = cor(map.x,map.y,method=corr.type)  
  
  # set up dataframes to store null correlations and p-value
  nperm <- nrow(perm_map.x)
  rho.null.x <- numeric(nperm)
  p.perm.x = numeric(nperm)
  
  #print("correlation to rotated tract profiles")
  for (i in 1:nrow(perm_map.x)) {
    rho.null.x[i] <- cor.test(as.numeric(perm_map.x[i,]), map.y, method = corr.type)$estimate
  }
  
  # calculate p-value 
  if (rho.emp > 0) {
    p.perm.x = sum(rho.null.x > rho.emp) / nperm
  } else {
    p.perm.x = sum(rho.null.x < rho.emp) / nperm
  }
   
 
  df <- data.frame(value = rho.null.x)
  df <- df %>% mutate(tractID = tract) %>% 
    mutate(rho.emp = rho.emp, p.perm.x=p.perm.x) 
  
  plot <- ggplot(df, aes(x = value)) +
    geom_histogram(fill = '#99C9C7', color = '#99C9C7', alpha = 0.6) +
    geom_segment(aes(x=rho.emp, xend = rho.emp, y = 0, yend = 900), linetype="dashed", color = "#c04b29") + 
    annotate("text", x=rho.emp-0.15,y=900, label = substitute(atop(r == rho.emp_value, p[spin] == p.perm.x_value), 
                                list(rho.emp_value = round(rho.emp, 2), p.perm.x_value = p.perm.x)), hjust=0.5, vjust= -0.5, color="#c04b29", size = 4) + theme_classic() + 
    theme(plot.title = element_text(hjust=0.5),
          axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          axis.text.x = element_text(size = 14),
          axis.text.y = element_text(size=14))+ 
    labs(title = tract) + coord_cartesian(ylim = c(0, 1300), xlim=c(-1,1.15))  
  
  return(plot) # can edit to return p value only 
}


spin_test_wrapper <- function(tract) {
  perm_map.x <- perm.id[[tract]]
  map.x <- bilat_gm_dist_orig %>% filter(tractID==tract) %>% pull(dist_to_gm)
  map.y <- gam_age_md_bilat %>% filter(tractID==tract) %>% pull(s_age.delta.adj.rsq_signed)
  ##print(paste("spinning", tract))
  perm_tract <- perm.circle.p(tract, perm_map.x, map.x, map.y)
  return(perm_tract)
}

load files

  • age effect (computed at 100 nodes)
  • original bilateral distances to cortex (sampled at 100 nodes)
  • densely sampled bilateral distances to cortex (sampled at 10k nodes): used for generating nulls
# load age effect
gam_age_md_bilat <- read.csv(sprintf("%1$s/GAM/dti_md/GAM_bilateral_signedresults.dti_md.age.csv", outputs_root))


# load bilateral distances to cortex - original data (sampled at 100 nodes)
bilat_gm_dist_orig <- load_bilat_distances("HCPD", "")


# load bilateral distances to cortex (sampled at 10k nodes)
bilat_gm_dist_toplot <- load_bilat_distances("HCPD", "10k")
#write.csv(bilat_gm_dist_toplot, "/cbica/projects/luo_wm_dev/output/HCPD/tract_profiles/all_subjects/across_subjects_bilateral_distances_to_gm_oriented_long_10k.csv")

Spin test

  • Starting at each point in my length 10k cortical distance data, sample 100 equidistant points along the entire tract.
  • I’m operationalizing the ‘spin’ by creating an ‘extended_profile’ (c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])). Then shifting a window along this ‘extended_profile’. This is functionally equivalent to ‘spinning the dial’
  • I can sample 100 equidistant points along the spun tract. This null data preserves spatial autocorrelation but is shifted from the original tract profile.
  • Then for each null tract profile (with each point = distance to cortex), I correlate it with my age effect map to generate a null distribution of Pearson correlations
perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, nrot=10000, seed=123) 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)

Plot distributions of null Pearson correlations

  • ugh the distributions are SO WIDE. I think the data is just super spatially autocorrelated. Null distributions from data that was shifted just 1 or 2 nodes might be way too similar to the original data.
# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))

How correlated is the original tract profile data with the nulls that start at node 10? 25? 33?

check_tractprofile_nulls <- function(tract, num_samples = 100, nrot = 10000, seed = NULL, return_measure) {
  tract_profile <- bilat_gm_dist_toplot %>% filter(tractID==tract)
  tract_profile <- tract_profile$dist_to_gm
  original_length <- length(tract_profile)
  
  # create an extended profile (functionally similar to creating a 1D circle of our data)
  extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])
  extended_n <- length(extended_profile)
 
  perm.id <- matrix(NA, nrow = extended_n - original_length + 1, ncol = num_samples)
  
  # starting at each point in my length 10k data, sample 100 equidistant points along the entire tract. 
  # this is how i'm generating the permutated or spun data. By shifting the window along my 'extended_profile' (functionally equivalent to 'spinning the dial' on a wheel),
  # i can sample 100 equidistant points along the tract that preserves spatial autocorrelation. But the data is now shifted!
  #print(paste("Rotating tract profiles data for", tract))
  for (shift in 1:(extended_n - original_length + 1)) {
    sample_indices <- seq(shift, by = original_length / num_samples, length.out = num_samples) %% (extended_n - 1) + 1
    shifted_data <- extended_profile[sample_indices]
    perm.id[shift, ] <- shifted_data
  }
  
  perm.id_df <- data.frame(perm.id)
  perm.id_df <- perm.id_df[!duplicated(perm.id_df),] 
  
  
  orig <- bilat_gm_dist_orig %>% filter(tractID == tract)
  cors <-  matrix(NA, nrow(perm.id_df), ncol = 2)
  pvals <- matrix(NA, nrow(perm.id_df), ncol = 2)
  for (row in 1:nrow(perm.id_df)) {
    cor.test.output <- cor.test(orig$dist_to_gm, as.numeric(perm.id_df[row,]))
    cors[row,2] <- cor.test.output$estimate
    if (row < 10000) {
      node_position <- row
    } else {
      node_position <- row - (row-10000)*2
    }
    
    cors[row,1] <- node_position
    
    pvals[row,2] <- cor.test.output$p.value
    pvals[row,1] <- node_position
  }
  
   cors <- data.frame(cors) %>% setNames(c("node_position_10k", "correlation_with_orig_data"))
   pvals <- data.frame(pvals) %>% setNames(c("node_position_10k", "pval"))
   plot_cors <- ggplot(cors, aes(x = node_position_10k/100, y=correlation_with_orig_data)) +
    geom_point(fill = '#99C9C7', color = '#99C9C7', alpha = 0.6) + theme_classic() +
     geom_hline(yintercept=0, linetype="dashed", 
                color = "black", size=1) +
    theme(plot.title = element_text(hjust=0.5),
          axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          axis.text.x = element_text(size = 14),
          axis.text.y = element_text(size=14))+ 
    labs(title = tract)  
   
   plot_pvals <- ggplot(pvals, aes(x = node_position_10k/100, y=pval)) +
    geom_point(fill = '#702963', color = '#702963', alpha = 0.6) + theme_classic() +
     geom_hline(yintercept=0.05, linetype="dashed", 
                color = "black", size=1) + 
    theme(plot.title = element_text(hjust=0.5),
          axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          axis.text.x = element_text(size = 14),
          axis.text.y = element_text(size=14))+ 
    labs(title = tract) 
  return(list(plot_cors, plot_pvals, cors, pvals))
}

Correlation between original tract profile data and 1D spin-test generated nulls

  • it looks like only a small number of nulls are NOT correlated to the original data. And those nulls are typically shifted by 25 or 75 nodes
  • The nulls where the first node now starts at node=50 tend to be perfectly anticorrelated with my original data (makes sense…should’ve thought about that lol)
null_cors <- lapply(unique(bilat_gm_dist_toplot$tractID), check_tractprofile_nulls, nrot=10000, seed=123) 
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
names(null_cors) <- unique(bilat_gm_dist_toplot$tractID)
 

# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("Starting Node of Null Tract Profile", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Correlation", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)
 
check_cors_final <- arrange_by_orientation(null_cors, "Correlations Between Null and Original Tract Profile", figure_index = 1)
grid.arrange(arrangeGrob(check_cors_final, left = y.grob, bottom = x.grob, right = space.grob))

Significance of the correlations between original tract profile data and 1D spin-test generated nulls

  • basically seeing a similar pattern as above (here I just plotted the p-values of those correlations)
# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("Starting Node of Null Tract Profile", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("P-value of Correlation", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

check_cors_final <- arrange_by_orientation(null_cors, "Pvalues of Cors Between Null and Original Tract Profile", figure_index = 2)
grid.arrange(arrangeGrob(check_cors_final, left = y.grob, bottom = x.grob, right = space.grob))

rotate_tractprofiles <- function(tract, num_samples=100, threshold, type) {
  tract_profile <- bilat_gm_dist_toplot %>% filter(tractID==tract)
  tract_profile <- tract_profile$dist_to_gm
  original_length <- length(tract_profile)
  
  # create an extended profile (functionally similar to creating a 1D circle of our data)
  extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])
  extended_n <- length(extended_profile)
 
  perm.id <- matrix(NA, nrow = extended_n - original_length + 1, ncol = num_samples)
  
  # starting at each point in my length 10k data, sample 100 equidistant points along the entire tract. 
  # this is how i'm generating the permutated or spun data. By shifting the window along my 'extended_profile' (functionally equivalent to 'spinning the dial' on a wheel),
  # i can sample 100 equidistant points along the tract that preserves spatial autocorrelation. But the data is now shifted!
  #print(paste("Rotating tract profiles data for", tract))
  for (shift in 1:(extended_n - original_length + 1)) {
    sample_indices <- seq(shift, by = original_length / num_samples, length.out = num_samples) %% (extended_n - 1) + 1
    shifted_data <- extended_profile[sample_indices]
    perm.id[shift, ] <- shifted_data
  }
  
  perm.id_df <- data.frame(perm.id)
  perm.id_df <- perm.id_df[!duplicated(perm.id_df),] 
  
  if (type == "correlation") {
      indices <- null_cors[[tract]][[3]][,1][(null_cors[[tract]][[3]]$correlation_with_orig_data < threshold & null_cors[[tract]][[3]]$correlation_with_orig_data > -threshold)] # keep only the spins that are correlated at a magnitude of less than 0.3 to the original data
  
  } else {
      indices <- null_cors[[tract]][[4]][,1][(null_cors[[tract]][[4]]$pval > 0.05)] # keep only the spins that are correlated at a magnitude of less than 0.3 to the original data
  
  }
  perm.id_df <- perm.id_df[indices,]
  return(perm.id_df)
}
 


perm.circle.p <- function(tract, perm_map.x, map.x, map.y, corr.type="pearson") {
  
  #empirical correlation
  rho.emp = cor(map.x,map.y,method=corr.type)  
  
  # set up dataframes to store null correlations and p-value
  nperm <- nrow(perm_map.x)
  rho.null.x <- numeric(nperm)
  p.perm.x = numeric(nperm)
  
  #print("correlation to rotated tract profiles")
  for (i in 1:nrow(perm_map.x)) {
    rho.null.x[i] <- cor.test(as.numeric(perm_map.x[i,]), map.y, method = corr.type)$estimate
  }
  
  # calculate p-value 
  if (rho.emp > 0) {
    p.perm.x = sum(rho.null.x > rho.emp) / nperm
  } else {
    p.perm.x = sum(rho.null.x < rho.emp) / nperm
  }
   
 
  df <- data.frame(value = rho.null.x)
  df <- df %>% mutate(tractID = tract) %>% 
    mutate(rho.emp = rho.emp, p.perm.x=p.perm.x) 
  
  plot <- ggplot(df, aes(x = value)) +
    geom_histogram(fill = '#99C9C7', color = '#99C9C7', alpha = 0.6) +
    geom_segment(aes(x=rho.emp, xend = rho.emp, y = 0, yend = 500), linetype="dashed", color = "#c04b29") + 
    annotate("text", x=rho.emp-0.15,y=500, label = substitute(atop(r == rho.emp_value, p[spin] == p.perm.x_value), 
                                list(rho.emp_value = round(rho.emp, 2), p.perm.x_value = p.perm.x)), hjust=0.5, vjust= -0.5, color="#c04b29", size = 4) + theme_classic() + 
    theme(plot.title = element_text(hjust=0.5),
          axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          axis.text.x = element_text(size = 14),
          axis.text.y = element_text(size=14))+ 
    labs(title = tract) + coord_cartesian(ylim = c(0, 800), xlim=c(-1,1.15))  
  
  return(plot) # can edit to return p value only 
}

get the node positions where pval > 0.05 or correlation is less than a certain value - for each tract, get the node positions are of cor < 0.1

Create null distributions with nulls that have an absolute corr of < 0.1 to the original data

  • then take the correlation for each of those nulls to the age map to generate null distribution of Pearson correlations
  • there are these odd bimodal distributions. Also Cingulum cingulate, ILF and IFOF are particularly weird looking!!
perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, threshold=0.1, type = "correlation") 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)

# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))

Create null distributions with nulls that have abs cor of < 0.3 to the original data

  • bimodal patterns are still there
perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, threshold=0.3, type = "correlation") 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)

# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))

Create null distributions with nulls that have pval > 0.05 for their corrs with original data

perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, threshold=NULL, type = "pval") 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)

# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))

## These are other checks I had done that are interesting but don’t work lol - Essentially, only take spun data that has shifted the data at least X number of nodes away from node 0. However this doesn’t work because a lot of the nulls we include here are still significantly correlated with the original data

Only take shifts that are 10 or more nodes away from node=0 and node=99

rotate_tractprofiles <- function(tract, num_samples = 100, nrot = 10000, seed = NULL) {
  tract_profile <- bilat_gm_dist_toplot %>% filter(tractID==tract)
  tract_profile <- tract_profile$dist_to_gm
  original_length <- length(tract_profile)
  
  # create an extended profile (functionally similar to creating a 1D circle of our data)
  extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])
  extended_n <- length(extended_profile)
 
  perm.id <- matrix(NA, nrow = extended_n - original_length + 1, ncol = num_samples)
  
  # starting at each point in my length 10k data, sample 100 equidistant points along the entire tract. 
  # this is how i'm generating the permutated or spun data. By shifting the window along my 'extended_profile' (functionally equivalent to 'spinning the dial' on a wheel),
  # i can sample 100 equidistant points along the tract that preserves spatial autocorrelation. But the data is now shifted!
  #print(paste("Rotating tract profiles data for", tract))
  for (shift in 1:(extended_n - original_length + 1)) {
    sample_indices <- seq(shift, by = original_length / num_samples, length.out = num_samples) %% (extended_n - 1) + 1
    shifted_data <- extended_profile[sample_indices]
    perm.id[shift, ] <- shifted_data
  }
  
  perm.id_df <- data.frame(perm.id)
  perm.id_df <- perm.id_df[!duplicated(perm.id_df),] 
 
   # remove the first and last 10% shifts (only take shifts that are 10 or more nodes away from node=0 and node=99)
  first_starting_node <- original_length/10
  last_starting_node <- original_length - first_starting_node - 1
  first_starting_node_extended <- last_starting_node + first_starting_node*2 + 1
  last_starting_node_extended <- first_starting_node_extended + (last_starting_node - first_starting_node) 
  perm.id_df <- perm.id_df[c(first_starting_node:last_starting_node, first_starting_node_extended:last_starting_node_extended),] 
   
  # set seed for reproducibility
  if (!is.null(seed)) {
    set.seed(seed)
  }
  
  # randomly shuffle the order of the rows and then extract nrot number of permutations
  # each row = 1 rotation
  # each column = node position
  perm.id_df_shuffled <- perm.id_df[sample(nrow(perm.id_df)), ]
  perm.id_final <- perm.id_df_shuffled[1:nrot,]
 
 
  return(perm.id_final)
}
perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, nrot=10000, seed=123) 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)
# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))

Only take shifts that are 25 or more nodes away from node=0 and node=99

rotate_tractprofiles <- function(tract, num_samples = 100, nrot = 10000, seed = NULL) {
  tract_profile <- bilat_gm_dist_toplot %>% filter(tractID==tract)
  tract_profile <- tract_profile$dist_to_gm
  original_length <- length(tract_profile)
  
  # create an extended profile (functionally similar to creating a 1D circle of our data)
  extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])
  extended_n <- length(extended_profile)
 
  perm.id <- matrix(NA, nrow = extended_n - original_length + 1, ncol = num_samples)
  
  # starting at each point in my length 10k data, sample 100 equidistant points along the entire tract. 
  # this is how i'm generating the permutated or spun data. By shifting the window along my 'extended_profile' (functionally equivalent to 'spinning the dial' on a wheel),
  # i can sample 100 equidistant points along the tract that preserves spatial autocorrelation. But the data is now shifted!
  #print(paste("Rotating tract profiles data for", tract))
  for (shift in 1:(extended_n - original_length + 1)) {
    sample_indices <- seq(shift, by = original_length / num_samples, length.out = num_samples) %% (extended_n - 1) + 1
    shifted_data <- extended_profile[sample_indices]
    perm.id[shift, ] <- shifted_data
  }
  
  perm.id_df <- data.frame(perm.id)
  perm.id_df <- perm.id_df[!duplicated(perm.id_df),] 
 
  # remove the first and last 25% shifts (only take shifts that are 25 or more nodes away from node=0 and node=99)
  first_starting_node <- original_length/4
  last_starting_node <- original_length - first_starting_node - 1
  first_starting_node_extended <- last_starting_node + first_starting_node*2 + 1
  last_starting_node_extended <- first_starting_node_extended + (last_starting_node - first_starting_node) 
  
  perm.id_df <- perm.id_df[c(first_starting_node:last_starting_node, first_starting_node_extended:last_starting_node_extended),] 
   
  # set seed for reproducibility
  if (!is.null(seed)) {
    set.seed(seed)
  }
  
  # randomly shuffle the order of the rows and then extract nrot number of permutations
  # each row = 1 rotation
  # each column = node position
  perm.id_df_shuffled <- perm.id_df[sample(nrow(perm.id_df)), ]
  perm.id_final <- perm.id_df_shuffled[1:nrot,]

  return(perm.id_final)
}
perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, nrot=10000, seed=123) 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)
  • these distributions are a bit negatively skewed. I THINK it’s because many of my nulls end up being something like node 50 to node 49 (which is anticorrelated with my original data)
# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))

only include shifts that are 33 nodes or more shifted from the original tract profiles data

rotate_tractprofiles <- function(tract, num_samples = 100, nrot = 10000, seed = NULL) {
  tract_profile <- bilat_gm_dist_toplot %>% filter(tractID==tract)
  tract_profile <- tract_profile$dist_to_gm
  original_length <- length(tract_profile)
  
  # create an extended profile (functionally similar to creating a 1D circle of our data)
  extended_profile <- c(tract_profile, rev(tract_profile[-length(tract_profile)]), tract_profile[-1])
  extended_n <- length(extended_profile)
 
  perm.id <- matrix(NA, nrow = extended_n - original_length + 1, ncol = num_samples)
  
  # starting at each point in my length 10k data, sample 100 equidistant points along the entire tract. 
  # this is how i'm generating the permutated or spun data. By shifting the window along my 'extended_profile' (functionally equivalent to 'spinning the dial' on a wheel),
  # i can sample 100 equidistant points along the tract that preserves spatial autocorrelation. But the data is now shifted!
  #print(paste("Rotating tract profiles data for", tract))
  for (shift in 1:(extended_n - original_length + 1)) {
    sample_indices <- seq(shift, by = original_length / num_samples, length.out = num_samples) %% (extended_n - 1) + 1
    shifted_data <- extended_profile[sample_indices]
    perm.id[shift, ] <- shifted_data
  }
  
  perm.id_df <- data.frame(perm.id)
  perm.id_df <- perm.id_df[!duplicated(perm.id_df),] 
  # remove the first and last 33% shifts (only take shifts that are 33 or more nodes away from node=0 and node=99)
  first_starting_node <- round(original_length/3)
  last_starting_node <- original_length - first_starting_node - 1
  first_starting_node_extended <- last_starting_node + first_starting_node*2 + 1
  last_starting_node_extended <- first_starting_node_extended + (last_starting_node - first_starting_node) 
  perm.id_df <- perm.id_df[c(first_starting_node:last_starting_node, first_starting_node_extended:last_starting_node_extended),] 
  
 
  # set seed for reproducibility
  if (!is.null(seed)) {
    set.seed(seed)
  }
  
  # randomly shuffle the order of the rows and then extract nrot number of permutations
  # each row = 1 rotation
  # each column = node position
  perm.id_df_shuffled <- perm.id_df[sample(nrow(perm.id_df)), ]
  perm.id_final <- perm.id_df_shuffled[1:nrot,]
 
 
  return(perm.id_final)
}

(max number of unique spins = 6600)

perm.id <- lapply(unique(bilat_gm_dist_toplot$tractID), rotate_tractprofiles, nrot=5000, seed=123) 
names(perm.id) <- unique(bilat_gm_dist_toplot$tractID)
 
spin_test_tracts <- lapply(unique(bilat_gm_dist_toplot$tractID), spin_test_wrapper)
names(spin_test_tracts) <- unique(bilat_gm_dist_toplot$tractID)
  • even more negatively skewed for the reason above
  • buuuut everything is super significant!! LOL p-value is probably deflated though right?
# arrange by AP, AP_frontal_temporal, RL (forceps), and SI
x.grob <- textGrob("1D Spin Null Distribution of Pearson Correlations", 
                   gp=gpar(col="black", fontsize=14))
y.grob <- textGrob("Frequency", 
                   gp=gpar(col="black", fontsize=14), rot=90)

space.grob <- textGrob(" ", 
                   gp=gpar(col="black", fontsize=28), rot=90)

spin_test_tracts_final <- arrange_by_orientation(spin_test_tracts, "Distance to Cortex vs. Age Effect")
grid.arrange(arrangeGrob(spin_test_tracts_final, left = y.grob, bottom = x.grob, right = space.grob))